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ABSTRACT 


Major processor manufacturers have embraced the high-level synthesis (HLS) design phi¬ 
losophy. HLS offers the potential to explore the design space of electronic circuits and 
systems more efficiently than traditional methods. In this thesis, we investigate the appli¬ 
cation of HLS to hardware-oriented security and trust by developing a model of a simple 
16-bit Central Processing Unit in the SystemC modeling language. We enhanced our pro¬ 
cessor with a simple security mechanism that enforces a memory integrity policy. The in¬ 
tegrity policy allows a region of the program labeled as trustworthy to modify any address 
in data memory, but another region of the program labeled as untrustworthy is restricted 
to only being able to modify a specific region of data memory. Our timing results show 
that adding the integrity policy enforcement mechanism has a negligible effect on overall 
system performance. 
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Executive Summary 


Major processor manufacturers have embraced the high-level synthesis (HLS) design phi¬ 
losophy. For example, Xilinx has incorporated HLS into its Vivado suite of Electronic 
Design Automation (EDA) tools. HLS offers the potential to explore the design space of 
electronic circuits and systems more efficiently than traditional methods. The HLS design 
process begins with a functional model that is iteratively refined to progressively finer lev¬ 
els of detail, eventually resulting in a cycle-accurate model of the system. In this thesis 
we investigate the application of HLS to hardware-oriented security and trust (HOST) by 
developing a model of a simple 16-bit CPU in the SystemC modeling language. Using Sys- 
temC, designers can express both hardware and software constructs in C++; therefore, the 
hardware and software of an embedded system can be simulated in the same environment, 
rather than using separate hardware and software simulators. Only a C++ compiler and the 
SystemC library are needed to design and simulate a circuit. 

Our processor is based on the design from the “Nand to Tetris” course that teaches computer 
science concepts across all levels of system abstraction by constructing a general-purpose 
computer system from the ground up, starting with digital logic gates and then progressing 
to a 16-bit CPU architecture, assembler, computer, and high-level language programming. 
To demonstrate the applicability of the HLS design approach to hardware-oriented security 
and trust, we enhanced our processor with a simple security mechanism that enforces a 
memory integrity policy. The integrity policy allows a region of the program labeled as 
trustworthy to modify any address in memory, but another region labeled as untrustworthy 
is restricted to only being able to modify a specific region of memory. Our timing results 
show that adding the integrity policy enforcement mechanism has a negligible effect on 
overall system performance. HLS has the potential to help designers of security enhance¬ 
ments as well as designers of the systems themselves, and a SystemC approach has the 
potential to make hardware design more accessible to computer scientists. Luture work 
will involve exploring a wider variety of programs, policies, policy enforcement mecha¬ 
nisms, and processors, as well as increasing the memory size, as cycle-accurate modeling 
of a large number of memory cells requires a very large RAM overhead, which is a known 
challenge with SystemC modeling. 
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CHAPTER 1: 

Introduction and Motivation 


Trustworthy system development is a major concern for the Department of Defense, which 
operates a large variety of complex systems that must be resilient to a wide array of de¬ 
velopmental and operational attacks. Developing trustworthy systems is expensive due to 
the high non-recurring engineering (NRE) costs of developing hardware and software, and 
the small customer base over which to amortize that high NRE cost. In addition, system 
designers are increasingly concerned with the security of the entire supply chain, including 
hardware [1], [2] and design tools [3]. Compromised hardware has the potential to under¬ 
mine policy enforcement mechanisms implemented in software. Addressing the security 
problem of the supply chain of electronics is very challenging due to the large number 
of vendors of electronic components and intellectual property (IP). Building hardware in 
a trusted foundry is one approach to addressing these issues, but building custom hard¬ 
ware in this way is costly. In addition to the fabrication costs, which increase according to 
Rock’s law in super linear fashion, the engineering costs are also very high [4], [5]. Even 
using reconfigurable hardware, such as field-programmable gate arrays (FPGAs), does not 
necessarily reduce the design cost although it may reduce the fabrication cost. 

The goal of this thesis is to reduce the cost and time for developing trustworthy hardware 
by leveraging high-level synthesis (HLS) to efficiently explore the design space in order to 
determine which design point optimally balances the tradeoffs of concern for the customer. 
HLS allows for the development of functional models at a high level of abstraction that can 
be quickly implemented in software [6]. Further refinement of a functional model results 
in a transactional model, and further refinement of a transactional model results in a timing 
model. Finally, the cycle-accurate model is the lowest level of abstraction and the finest 
level of granularity. 

Development and simulation of a complete cycle-accurate model is too expensive for all 
points in the design space. Therefore, the HLS methodology relies on quickly building 
coarse-grained models (e.g., the functional models) to quickly determine important metrics 
such as power and performance at a coarse level of granularity. By allowing the designer to 
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evaluate tradeoffs efficiently at a coarse level of granularity, HLS enables the design process 
to be more efficient than traditional methods. The designer can make important decisions 
at this stage before embarking on the tedious efforts and expensive costs of refining the 
coarse-grained design down to a cycle-accurate, fine-grained model. The beauty of this 
approach is that once the optimal point within the design space is determined, the high- 
level model can immediately be utilized, and the process of refinement can begin. 

A major language for system modeling is SystemC, which is based on the C++ program¬ 
ming language. SystemC is very simple and consists of a C++ library that can be readily 
downloaded for free. SystemC allows a designer to express a functional model in a mod¬ 
ified C++ language. In addition to expressing software, SystemC provides the advantage 
of being able to design hardware in this language. This is an improvement over traditional 
techniques in which software is designed in a traditional programming language, and hard¬ 
ware is designed in a traditional hardware description language, or HDL. The problem with 
the traditional approach is that the hardware is simulated in a hardware simulator, while the 
software is simulated in a software simulation environment. Having separate simulation 
environments for hardware and software is inefficient and inhibits the ability to co-design 
the hardware and software. 

We are not the first to apply HLS to hardware trust. Bathen and Dutt developed PoliMakE, 
which uses HLS to explore policies for multi-core processors [7]. PoilMakE is built on top 
of their SystemC simulation engine. SystemC is also used in PHiLOSoftware, which helps 
engineers design trustworthy systems based on multi-core processors [8]. 

Concerns about information security for modern computers have existed nearly since their 
inception. With the widespread use of modern computers, the need to provide information 
security became more evident [9]. Saltzer and Schroeder focus on safeguarding information 
for systems with multiple users on the same system [9]. As malicious software emerged, 
including computer viruses, worms, and Trojans, patches and firewalls were developed as 
countermeasures. While malicious software poses a tremendous challenge, recognition 
of the problem of malicious hardware has emerged, as the integrated circuit supply chain 
is world-wide. The potential for hardware breaches has increased as global consumption 
relies more heavily on outsourced equipment [2]. This thesis will assess and demonstrate 
how high-level synthesis (HLS) can facilitate the design of policy enforcement circuitry. 
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CHAPTER 2: 
Related Work 


The fields of computer security and computer hacking have evolved over time. Just as 
programmers work to protect system security, skilled and motivated hackers will attempt 
to exploit weaknesses in the protection mechanisms. 

Multiple publications touch on the matter of safeguarding computer systems. One of the 
most seminal of these works is J. H. Saltzer and Michael D. Schroeder’s “The Protection of 
Information in Computer Systems,” written in 1975 [9]. The authors of this work discuss 
how the invention of the Von Neumann general-purpose architecture drastically reduced 
the production cost of modern computers, which allowed wide spread use of the machines. 
Saltzer and Schroeder wrote their work with the “key concern” of safeguarding information 
against multiple users on the same system. As computer users and designers get savvier in 
their attempts to prevent software security breaches, malicious hackers must find new areas 
to attack, where advanced security has not yet been implemented. A more recent pub¬ 
lication entitled “Trustworthy Hardware: Identifying and Classifying Hardware Trojans” 
addresses the possibility of security threats introduced at the hardware level of modern 
computing. The potential for hardware breaches has increased as global consumption re¬ 
lies more heavily on outsourced equipment [2]. 

Karri et al. survey the emerging discipline of hardware oriented security and trust [2]. In 
a world in which hackers develop sophisticated exploits, they devise novel ways to bypass 
security mechanisms. Hardware vulnerabilities represent a means for such an exploit to 
occur. Karri et al. present a taxonomy of malicious circuitry for classifying malicious 
inclusions and countermeasures. 

The Department of Defense (DOD), like the rest of the information technology world, 
finds itself reliant on global outsourcing for the manufacturing of digital infrastructure. 
Therefore, the DOD is interested in mitigating supply chain threats by using enhanced 
government services, encouraging improved commercial practices, and requiring supply 
chain risk management [10]. While the NSA has established dozens of trusted foundries, 
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manufacturing all military electronics in trusted foundries may not be feasible. Design 
and manufacture of all hardware and software intellectual property in house is expensive, 
time consuming, and might not yield a defect-free result [2]. The vulnerability exists for 
untrusted foundries to maliciously modify a circuit without user knowledge. 

A three-year investigation conducted by the Federal Bureau of Investigation (FBI), from 
2004-2006, discovered “counterfeit Cisco routers in U.S. defense, finance, and university 
networks” [11]. Even more alarming than the actual discovery of the compromised hard¬ 
ware was the fact that many of the routers came directly from “untrustworthy sources in 
foreign countries.” While the investigation did not detect malicious hardware injections 
and concluded that the infiltrator’s motivations appeared merely fiscal, the investigation 
“vividly illustrate[d] the vulnerability” users face when purchasing unverified hardware [2, 
p. 39], 

Karri et al. suggest the creation of a “hardware Trojan taxonomy” to address the possi¬ 
ble introduction of malicious circuitry made in “untrusted factories” [2]. They state, “To 
be trustworthy, hardware must exhibit only the functionality for which it was designed, 
nothing more and nothing less; conceal any information about the computation performed 
through any side channels such as power and delay; and be transparent only to the designer 
while remaining opaque to others, who should know nothing about its design and internal 
states” [2]. They also provide a more detailed definition of a hardware Trojan as “a mali¬ 
cious and deliberately stealthy modification made to an electronic device such as an IC. It 
can change the chip’s functionality and thereby undermine trust in the systems using that 
trojaned chip” [2]. 

The hardware Trojan taxonomy created by Karri et al. is based on five different categories: 
the insertion phase, abstraction level, activation mechanism, effects, and location [2]. The 
insertion phase covers all possible points at which a hacker could maliciously alter hard¬ 
ware and remain undetected throughout the testing cycle. The activation mechanism phase 
deals with potential points at which a hardware Trojan could be activated. For example, 
the hardware Trojan could be either “always on” or “triggered” by an external event. The 
effects category addresses four generalized results the Trojan could create; it could “change 
the functionality, downgrade performance, leak information, [or] den[y] service” [2]. With 
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the hardware Trojan taxonomy established, the real work-identifying and preventing hard¬ 
ware Trojan implementation-can begin. 

A follow-on article entitled “Trustworthy Hardware: Trojan Detection and Design-for- 
Trust Challenges” delves even further into installing safeguards against potential malicious 
hardware elements [1], The reliance of globalization and outsourcing of the semiconductor 
industry is again cited for creating the increased vulnerability of hardware Trojans [1]. To 
offset this challenge, the authors suggest “either the developer must make the IC design and 
fabrication process trustworthy or the client must verify the IC for trustworthiness” [1]. 

The trust element of chip manufacturing is a primary concern for both the Department of 
Defense and the general consumer market as a whole. In addition to trustworthiness of de¬ 
sign, cost, performance, and functionality are important considerations in creating a “win¬ 
ning design.” Kurt Keutzer states, “The overall goal of electronic embedded system design 
is to balance production cost with development time and cost in view of performance and 
functionality considerations. Manufacturing cost depends mainly on the hardware (HW) 
components of the product” [4]. 

Our work explores the potential benefits of HLS for trustworthy system development. Ma¬ 
jor chip manufacturers such as Xilinx and Intel have adopted HLS in their design practices, 
and we argue that HLS can also facilitate the design of policy enforcement mechanisms. 
To validate our hypothesis, we construct a general-purpose computer in the SystemC lan¬ 
guage and enhance it with a simple memory integrity policy enforcement mechanism. We 
then evaluate system performance both with and without the security enhancement. While 
our work falls within the scope of hardware-oriented security and trust, we do not claim to 
directly address the problem of malicious hardware inclusions in our work. 
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CHAPTER 3: 
Design Flow 


In our efforts to demonstrate the capability of utilizing HLS in the design of pol¬ 
icy enforcement circuity, we chose to follow the coursework of Nisan and Schocken 
[12]. The Nisan and Schocken text is complemented with online material found at 
http://www.nand2tetris.org [12]. The course is often simply referred to as "nand2tetris." 
The premise of their work is to help both individuals with and without computer science 
backgrounds to comprehend the process of building a modern computing machine, as well 
as to implement software to run on the machine. Our project focuses on the first portion of 
their text and coursework—the construction of the modem computer. 

The Nisan and Schocken course [12] incrementally builds upon logic gates to construct 
a 16-bit modern computer—named HACK. Modern computers are built with transistors 
that physically implement simple Boolean functions, called logic gates. One of the most 
fundamental logic gates, described in more detail in the section below, is the NAND gate. 
Logically, the NAND gate performs two functions. First, it takes two inputs and performs 
the AND function on them. The truth table produced from this function is shown as Table 
3.1. The next logic operation is the NOT function, which yields the inversion of the original 
input. The truth table for the NOT function is provided in Table 3.2. Combining both of 
these functions together produces the results shown in Table 3.3. As explained below, 
NAND gates are universal building blocks for combinational circuitry. 


a 

b 

out 

0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

1 

1 


Table 3.1: The truth table for the AND function of discrete math. 


The HACK computer is capable of performing simple 16-bit operations. Ultimately, there 
are 28 functions the ALU can compute (a complete list of the 28 computations is found 
on page 67 of Nisan and Shocken’s book). This chapter follows the development of each 
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IN 

OUT 

0 

1 

1 

0 


Table 3.2: The NOT function's truth table. 


A 

B 

(AB)’ 

0 

0 

1 

0 

1 

1 

1 

0 

1 

1 

1 

0 


Table 3.3: The NAND gate's truth table. 


logic gate in the manner described by Nisan and Shocken, and describes how it relates 
to the overall development of the HACK computer. By the end of this chapter, all logic 
gates needed to construct the HACK machine have been created and connected to yield the 
HACK machine in SystemC. 

3.1 “In the beginning, there was NAND...” 

Because of their universality, NAND gates are fundamental building blocks of all combi¬ 
national circuits. Electrical engineers can easily build NAND gates out of only a handful 
of transistors. All other logical gates can be constructed using NAND gates. 

We created the NAND gate in SystemC by declaring two boolean inputs A and B. The 
NAND gate logically "ands" the two boolean inputs then negates that result, which pro¬ 
duces the output. In the SystemC code used for this project, F represents the final boolean 
output of the NAND gate. 

To properly construct and run the NAND gate in SystemC, the boolean inputs A and B 
are logically "anded" together, and then their result is "notted" via the function ‘do_nand2.’ 
Figure 3.1 illustrates the composition of the NAND gate. 
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A 


p-F 

H_ J 

NAND gate 

Figure 3.1: The "NAND" gate is a fundamental building block for digital logic designs. 

The following is the actual SystemC code for the NAND gate: 

SC_M0DULE(nand2) // declare nand2 sc_module 

{ 

sc_in<bool> A, B; // input signal ports 

sc_out<bool> F; // output signal ports 

void do_nand2() // a C++ function 

{ 

F.write( !(A.read() kk B.readO) ); 

} 

SC_CT0R(nand2) // constructor for nand2 

{ 

SC_METH0D(do_nand2); // register do_nand2 with kernel 

sensitive << A « B; // sensitivity list 

> 

>; 

3.2 Next came ... AND 

All additional logic gates required to build a digital computer can be derived from NAND 
gates. An AND gate in SystemC consists of two NAND gates wired together. The AND 
gate handles two boolean inputs A and B, uses one internal boolean signal SI, and produces 
a boolean output F. The SC_CTOR sets up the digital layout of the logic device by feeding 
A and B into the first NAND gate, nl. The output of nl, SI, is then fed as both the A and 
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B inputs to the second NAND gate, n2. This procedure yields a final output F for the AND 
gate. Figure 3.2 illustrates the composition of the AND gate: 



The AND gate 

Figure 3.2: The AND gate utilizes two NAND gates to produce its output. 
Here is the SystemC code for an AND gate: 

SC_M0DULE(_and2) 

{ 

sc_in<bool> A, B; 
sc_out<bool> F; 

nand2 nl, n2; 

sc_signal<bool> SI; 

SC_CT0R(_and2) : nl("Nl"), n2("N2") 

{ 

nl.A(A); 
nl.B(B); 
nl.F(Sl); 

n2.A(Sl); 
n2.B(Sl); 
n2.F(F); 

> 

>; 
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3.3 Many more logic gates are now possible 

We construct an OR gate in SystemC by utilizing three NAND gates, nl, n2 , and nS. The 
logical design created utilizing the SC_CTOR accepts two boolean inputs, A and B , gen¬ 
erates two internal signals, SI and 52, and produces a final boolean output, F. The first 
NAND gate nl uses A for both its inputs and outputs SI. The second NAND gate n2 uses 
B for both its inputs and produces signal 52. The third NAND gate n3 accepts SI and 52 as 
inputs, yielding F as the final output. Figure 3.3 illustrates the composition of the OR gate. 



The OR gate 

Figure 3.3: The OR gate employs three NAND gates to produce output F. 
Here is the SystemC code for the OR gate: 

SC_M0DULE(_or2) 

{ 

sc_in<bool> A, B; 
sc_out<bool> F; 

nand2 nl, n2, n3; 

sc_signal<bool> SI, S2; 

SC_CT0R(_or2) : nl("Nl"), n2("N2"), n3("N3") 

{ 


nl.A(A); 
nl.B(A); 
nl.F(Sl); 
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n2.A(B); 
n2.B(B); 
n2.F(S2); 

n3.A(Sl); 
n3.B(S2); 
n3.F(F); 

> 

>; 

3.4 Have you ever dealt with a NOT? 

Constructing a NOT gate in SystemC merely requires one internal NAND gate, n. IN 
becomes both inputs to the NAND gate n. The output from n yields the final output F. 
Figure 3.4 illustrates the composition of the NOT gate. 



Figure 3.4: The NOT gate merely requires one NAND gate. 
Here is the SystemC code for the NOT gate: 

SC_M0DULE(_not1) 

{ 

sc_in<bool> IN; 
sc_out<bool> OUT; 

nand2 n; 

SC_CTOR(_notl) : n("N") 
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{ 

n.A(IN); 
n.B(IN); 
n.F(OUT); 

> 

>; 

3.5 The XOR gate 

Creating an XOR gate in SystemC requires four NAND gates, nl, n2, n3, and ti4. We 
create the logical circuit by connecting two boolean inputs (A and B ) to the NAND gates 
as shown in Figure 3.5. In addition to the two boolean inputs, the XOR circuit uses three 
internal boolean signals and produces the final boolean output (F). 



Figure 3.5: The XOR gate requires four NAND gates. 

3.6 The Multiplexor (a.k.a "Mux") 

To build a multiplexor (aka Mux ) in SystemC, we utilize the logic gates described above. 
The Mux requires two AND gates, one NOT gate, one OR gate, and three input booleans 
(A, B , and SEL). It uses three internal signals, SI, S2, and NOTSEL, to produce a final 
output F. Figure 3.6 illustrates the composition of a Mux. 
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SEL 



Figure 3.6: The Mux consists of two AND gates, one NOT gate, and one OR gate 


3.7 D-Mux that, please 

A demultiplexer, or D-Mux, performs the inverse function of the multiplexor. Rather than 
selecting between two inputs, the D-Mux "is an output selector which has a single input 
and directs it to one of N outputs" [13]. 

Constructing a D-Mux requires the use of two AND gates, al and c/2, and one NOT gate, 
n. The SC_CTOR wires the logic design as follows: external boolean input IN is wired as 
one of the inputs required for both al and a2. An additional external value, SEL is wired 
directly to AND gate c/2 as its second required input. SEL is also run through the NOT gate 
n where its result, NOTSEL, is used as the second input for AND gate al. The output of al 
is A. The output of c/2 is B. Depending on the value of the SEL bit, the initial input value 
IN will be passed through as output A or output B. Figure 3.7 illustrates the composition of 
a demultiplexer. 



Figure 3.7: The D-Mux consist of two AND gates and one NOT gate. 
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3.8 Handling larger input arrays 

Each of the previously described elementary logic gates in our digital design handles single¬ 
bit inputs, but for the machine we are building, 16-bit buses must be dealt with. By com¬ 
bining multiple single-bit gates, the 16-bit input buses can be processed. In the case of 
processing a 16-bit value through the NOT gates, we combine sixteen NOT gates to form a 
NOT_16 gate. The composition of the NOT_16 circuit is illustrated in Figure 3.8. 


IN[0] 
IN[1] 
IN[2] 
IN[3] 
IN[4] 
IN[5] 
IN[6] 
IN [7] 
IN[8] 
IN[9] 
IN[I0] 
IN[11] 
IN[12] 
IN[13] 
IN[14] 
IN[15] 



F[0J 

F[1J 

F[2J 

F[3] 

F[4J 

F[5J 

F[6J 

F[7] 

F[8] 

F[9] 

F[10J 

Fill] 

F[12J 

F[13] 

F[14J 

F[15J 


The "NOT_16" gate 


Figure 3.8: Sixteen NOT gates are placed together to handle a 16-bit input bus. 


A simpler visual display is provided in Figure 3.9. In this diagram, the sixteen bits are all 
fed into the NOT_16 gate simultaneously, producing a 16-bit output, F/16. 



Figure 3.9: The NOT_16 gate handles a 16-bit bus input, negates the value of each simultane¬ 
ously, and yields output F/16. 
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To complete a fully functioning simple modern computer, we continued to build additional 
logic chips as specified by Nisan and Schocken. The computer they designed is called 
the HACK machine, and we largely follow their designed machine—only creating it in 
SystemC instead of HDL. Just as we did with the NOT_16 gate, we combine other smaller 
logic gates to form logic circuits that can handle 16-bit bus inputs and produce 16-bit bus 
outputs. This effort quickly produced the following three logic gates: the AND_16 gate, 
the OR_16 gate, and a Mux_16 gate. They are illustrated in Figures 3.10, 3.11 and 3.12. 


A/16 

B/16 



F/16 


Figure 3.10: The AND_16 gate handles 16-bit bus inputs, "AND-ing" each of the input bits 
simultaneously 


A/16 

B/16 



F/16 


Figure 3.11: The OR_16 gate handles 16-bit bus inputs simultaneously and produces the output 
F/16. 


SEL 



F/16 


Figure 3.12: The Mux_16 gate handles two 16-bit bus inputs simultaneously to select a final 
output of F/16. 


An additional logic circuit named the OR_8 gate utilizes seven regular OR gates to select 
between eight boolean inputs, A, B , C, D , E, F, G, and H. The internal OR gates are orl, 
or2, orS, and or4. Their outputs are wired to or5 and or6. Finally, the outputs of or5 and 
or6 are input into or7, which yields the final output F. The logical implication of this circuit 


16 



is as follows: if any of the inputs is true, the output of the OR_8 will be true. The only time 
the OR_8 gate will produce a false or zero output is when all input booleans are also false. 
Figure 3.13 illustrates the internal composition of the OR_8 gate. 



Figure 3.13: The OR_8 gate produces a "true" bit output if any of the eight input bits are true. 
If all eight boolean inputs are "false" or zero values, the output of the OR_8 will be zero. 


3.9 Continuing the construction of larger logic gates 

The next logic gate we designed in SystemC was the Mux4wayl6 gate. The Mux4wayl6 
logic gate performs the function of selecting between four 16-bit bus inputs - A/16, B/16, 
C/16, and D/16. Also required in this effort are two select bits, SELO and SELL These 
select bits allow a selection to be made between the four options, which yields the final 
output F_16. Internally, two additional 16-bit buses, S/16 and T/16, are required. Figure 
3.14 illustrates the internal composition of the Mux4wayl6 gate. 

Progressing sequentially in our design, the next logical step was creating a Mux8wayl6 
gate. Just as the name suggests, the Mux8wayl6 gate selects a final output bus from eight 
16-bit input buses. The construction of the Mux8wayl6 in SystemC requires eight input 
buses, named A/16, B/16, C/16, D/16, E/16, F/16, G/16, and H/16, and three select bits 
{SELO, SEL1, and SEL2). Two Mux4wayl6 gates, mO and ml, and one Mux_16 gate, m2, 
are utilized internally to produce the desired outcome. The composition of the Mux8wayl6 
is illustrated in Figure 3.15. 
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SELO 


SEL1 



Figure 3.14: The Mux4wayl6 selects between four 16-bit input buses and produces one 16-bit 
output bus. 


3.10 Once you Mux, it’s easy to D-Mux 

Just as we expanded the multiplexors to handle more inputs, demultiplexors can also be 
expanded to handle more outputs. A demultiplexer performs the inverse function of the 
multiplexor, taking a single input, IN, and directing it to one of N outputs. Figure 3.16 
demonstrates a 4-bit D-Mux. Our SystemC implementation utili z es three D-Mux gates, dl, 
d2, and d3, and two select bits, SELO and SEL1, to construct the D-Mux4way. Two internal 
booleans, S and T, are also used. The four final outputs are labeled A, B, C and D. The 
composition of the D-Mux4way is illustrated in Figure 3.16. 

To extend the capabilities of the demultiplexer to handle eight outputs, we created the 
Dmux8way gate. The Dmux8way consists of one D-Mux2way gate, dl, and two D- 
Mux4way gates, c/2 and d3. Three input selectors, SELO, SEL1 and SEL2, are also required. 
The eight outputs are labeled A, B, C, D, E, F, G, and H. Figure 3.17 shows the logical 
construction of an 8-way demultiplexer. 
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A/16 

B/16 

C/16 

D/16 

E/16 

F/16 

G/16 

H/16 



0/16 


Figure 3.15: The Mux8wayl6 selects its final outcome choice 0/16 from among eight 16-bit 
buses. 


SEL1 SELO 



Figure 3.16: The D-Mux4way directs an incoming bit to one of four outputs. 
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SELO 



A 

B 

C 

D 

E 

F 

G 

H 


Figure 3.17: The D-Mux8way directs an incoming bit to one of eight outputs. 


3.11 Time for some simple addition: Introducing the 
Half-Adder and Full-Adder 

As we move forward with the construction of the HACK machine using SystemC, per¬ 
forming arithmetic operations is necessary. For the simplest arithmetic, addition, we are 
able to implement this capability in two steps: constructing a HalfAdder, followed by the 
construction of a FullAdder. 



To make the HalfAdder in SystemC, two boolean inputs, A and B , are required, and the 
operation yields two boolean outputs, SUM and CARRY. Internal composition of the Half- 
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Adder gate requires one XOR gate and one AND gate. Their composition is shown in Figure 
3.18. 


Now that the Half Adder has been constructed, assembling the FullAdder becomes possible. 
The FullAdder is built by assembling together two Half Adders (hal and ha2) and an OR 
gate (or) and then routing in three boolean inputs (A, B and CIN), utilizing three internal 
boolean signals (S, T, and U), and ultimately producing two boolean outputs - SUM and 
COUT. The diagram in Figure 3.19 illustrates the assembly of the FullAdder. 



Figure 3.19: The FullAdder 


3.12 Preforming addition on input buses 

Now equipped with the ability to preform addition, our next goal consisted of performing 
addition on 16-bit values. Much like we did with previous implementations of logic chips 
for larger buses, the construction of the ADD_16 gate contains sixteen FullAdders in order 
to process the operands. 

The parts of the ADD_16 gate are arranged as follows using SystemC: two 16-bit input 
buses, A/16 and B/16, the operands to be added together. Each of the individual FullAdder 
gates - faO, fal , fa2, fa3, fa4, fa5, fa6, fa7, fa8, fa9, falO, fal 1 , fall, fal3, fal4, and fal5 
- generates an internal carry-out signal ( c-out[n]), which is passed to the next sequential 
FullAdder gate. The last carry-out bit is discarded. The SUM is output from each internal 
FullAdder then placed on a bus. Figure 3.20 illustrates the composition of the ADD_16 
circuit. 

Continuing our efforts to enhance the HACK machine’s ability to perform arithmetic op¬ 
erations, implementing an incrementor was the next logical step. Our incrementor, named 
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(SUM) F[0] 
(SUM) F[l] 
(SUM) F[2] 
(SUM) F[3] 
(SUM) F[4] 
(SUM) F[5] 
(SUM) F[6] 
(SUM) F[7] 
(SUM) F[8] 
(SUM) F[9] 
(SUM) F[10J 
(SUM) F[11J 
(SUM) F[12J 
(SUM) F[13] 
(SUM) F[14] 
(SUM) F[15J 


Figure 3.20: The ADD_16 gate adds two 16-bit values together. 


INC_16, is designed to handle an arbitrary 16-bit boolean value and increase (or incre¬ 
ment) it by one. Designing the INC_16 gate in SystemC requires one ADD_16 gate. An 
abstraction of the implementation of INC_16 is shown in Figure 3.21. 

Another logic gate needed for the HACK machine is the Controlled_Zerol6. The Con- 
trolled_Zerol6 circuit accepts any 16-bit input, IN/16, and proceeds, when instructed by 
the select bit C, to zero all bits of the bus, producing an output bus, OUT/16, of all zeros. 
If the C select bit is not asserted, the inputted value will be outputted as OUT/16 without 
any alteration to the value. To build the Controlled_Zerol6 circuit in SystemC, one AND16 
gate, one Muxl6 gate, and one NOT/16 are utilized. Two internal boolean buses, S/16 and 
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IN/16 


ONE/16 


ADD_16 


OUT/16 


Figure 3.21: The INC_16 circuit increments an inputted value, IN, by one. 


T/16 , are also used. The logical layout of the Controlled_Zerol6 circuit is illustrated in 
Figure 3.22. 


(SEL) C 


IN/16 



OUT/16 


Figure 3.22: The Controlled_Zerol6 gate can either zero out the entire inputted value or output 
the original input unaltered. 


The next digital logic circuit we designed in SystemC is the ControlledJNot gate. Very 
similar to the Controlled_Zerol6 circuit, the ControlledJNot gate negates (or "flips") each 
of the 16-bit values it receives. The Controlled_Not circuit is constructed with one Muxl6 
and one Notl6 gate. The 16-bit input bus IN is spliced in two directions - one running 
through the NOT16 gate, nl, before being fed into the Muxl6, and the other being fed 
directly into the Mux 16. The select bit C choses which value to output as OUT/16, either 
the original inputted value or its negated/complemented value. The construction of the 
Controlled_Notl6 logical circuit is illustrated in Figure 3.23. 
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(SEL) C 



Figure 3.23: The Controlled_Notl6 will either flip each bit of the input bus or output the 
inputted value unaltered. 


3.13 Two more arithmetic circuits must be constructed 
prior to the ALU 

As we near the ability to construct an ALU for our HACK machine, only two more arith¬ 
metic circuits are needed. The first of these two is called the AND_or_ADD16 circuit. The 
AND_or_ADD16 circuit requires one AND16 gate, one ADD16 gate, and one Muxl6 gate. 
Inputs, A/16 and B/16 are wired to both the AND16 gate and the ADD16 gate. The out¬ 
puts from the AND 16 and the ADD 16 are then inputted into the Mux 16. The select bit C 
determines which of the two inputted values is selected for the output OUT/16. If the C bit 
is zero ("0"), the output of the AND 16 gate is passed through as the output. Otherwise, if 
the C is a one ("1"), the output of the ADD 16 is passed through as the final result, OUT/16. 
Figure 3.24 illustrates the logical arrangement of the AND_or_ADD16 circuit. 

(SEL) C 

OUT/16 

Figure 3.24: The AND_or_ADD16 circuit 
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The last logic circuit required before implementing the ALU is the check_16. The check_16 
produces a 16-bit boolean output bus, as well as two single-bit boolean output signals, ZR 
and NG. To construct the check_16 circuit in SystemC, fifteen OR gates, sixteen AND gates, 
and one NOT gate are used (alternatively, two Or8way gates, one two-way OR gate and one 
NOT gate can also be used). The internal wiring is illustrated in Figure 3.25. 


IN[0] 
IN[1] 
IN [2] 
IN [3] 
IN [4] 
IN [5] 
IN [6] 
IN [7] 
IN [8] 
IN [9] 
IN[ 10] 
IN[ 11] 
IN[ 12] 
IN[ 13] 
IN[ 14] 
IN[15] 



Figure 3.25: The Check_16 circuit 
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OUT[l] 

OUT[2] 

OUT[3] 

OUT [4] 

OUT[5] 

OUT [6] 

OUT [7] 

OUT[8] 

OUT [9] 

OUT[10] 

OUT[ll] 

OUT [12] 
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OUT[14] 

OUT[15] 

ZR 

NG 
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3.14 Time to build the ALU 


We now have all digital logic tools required to build the HACK’S ALU (Arithmetic Logic 
Unit). While "The centerpiece of the computer’s architecture is the CPU... the centerpiece 
of the CPU is the ALU, or Arithmetic-Logic Unit" [12]. 

The ALU executes all the arithmetic and logical operations performed by a computer. De¬ 
pending on what operations a computer designer wants his/her machine to calculate, the 
exact operations of the ALU may vary from one computer design to another. In our case, 
we are designing the HACK machine as specified by Nisan and Schocken. Nisan and 
Schocken state, "The Hack ALU computes a fixed set of functions out =fi(x, y) where x 
and y are the chip’s two 16-bit inputs, out is the chip’s 16-bit output, and //' is an arithmetic 
or logical function selected from a fixed repertoire of eighteen possible functions. We in¬ 
struct the ALU which function to compute by setting six input bits, called control bits , to 
selected binary values" [12]. 
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To construct the ALU in SystemC, two boolean input buses X/16 and Y/16 as well as six 
input signals (ZX. ZY, NY, NY, F, and NO) were used to produce the output bus OUT/16 
and two output signals ZR and NG. The digital logic circuits utilized included two Con- 
troledjzerol6s (named czl and cz2), three Controlled_notl6 chips (named cnl, cn2 and 
cn3), an AndORaddl6 unit (referred to as ad) and a checkl6 circuit (simply named check). 
Figure 3.26 provides a visual representation of the ALU’s logical construction. 


3.15 Constructing more hardware: a single-bit and 16-bit 
register 

Next, we designed a single bit register and then built a larger, 16-bit register. To construct 
the single-bit register in SystemC, a single boolean input IN was fed in, and boolean input 
signals LOAD and CLOCK were also fed in. The MUX logic gate, as well as a digital 
flip-flop (commonly referred to as a "dff") were also used. Two internal signals S and T 
were used, and the output signal was OUT. See Figure 3.28 for a visual representation of 
the single-bit register compilation. 


LOAD CLOCK 



Figure 3.27: The internal composition of a single-bit register. 

With the single-bit register constructed, we were able to combine sixteen together. The 
16-bit register (also simply referred to as register) consists of 16 single-bit registers (bO- 
bl5), a 16-bit input bus, a LOAD bit, and a CLOCK bit. Its full logical layout is shown in 
Figure 3.28. 


27 




LOAD CLOCK 


IN[0] 

IN[1] 

IN[2] 

IN[3] 

IN[4] 

IN[5] 

IN[6] 

IN[7] 

IN[8] 

IN[9] 

IN[10] 

IN[11] 

IN[12] 

IN[13] 

IN[14] 

IN[15] 


- bit-register, bO - 

- bit-register, bl - 

- bit-register, b2 - 

- bit-register, b3 - 

- bit-register, b4 - 

- bit-register, b5 - 

- bit-register, b6 - 

- bit-register, b7 - 

- bit-register, b8 - 

- bit-register, b9 - 

- bit-register, b 10 - 

- bit-register, bll - 

- bit-register, b 12 - 

- bit-register, b 13 - 

- bit-register, b 14 - 

- bit-register, b 15 - 


OUT[0] 

OUT[l] 

OUT[2] 

OUT[3] 

OUT[4] 

OUT[5] 

OUT[6] 

OUT[7] 

OUT[8] 

OUT[9] 

OUT[10] 

OUT[l 1] 

OUT[12] 

OUT[13] 

OUT[14] 

OUT[15] 


Figure 3.28: A 16-bit register is capable of holding the 16-bit input values HACK uses to operate. 


3.16 Let’s store some memory 

Having constructed the 16-bit register, we now can construct the random access memory 
(RAM). The RAM8 unit allows us to address eight words of memory. Once we have chosen 
what address we want, we can either read information from the location or write informa¬ 
tion to the location. We use recursive ascent to build larger memories. The following 
demonstrate the recursive ascent approach. 

To construct the RAM8, eight 16-bit registers, a D-mux_8, and a Mux8wayl6 were utilized. 
Additionally, three boolean signals (LOAD, CLOCK, and ADDR) were needed. The circuit 
is shown in Figure 3.29. 

To build the RAM64, eight RAM8’ s, a D-mux_8, and a Mux8wayl6 are utilized. A boolean 
input array IN/16 is routed into the RAM64 circuit together with a LOAD and CLOCK bit. 
The circuit is shown in Figure 3.30. 
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AD 


/16 



OUT/16 


Figure 3.29: RAM8 


OAD 


N/16 


ADDR5 

ADDR4I 

ADDR3I 


ADDRs 0, 1, & 2 - run to each RAM8 circuit (r0-r7) 



OUT/16 


Figure 3.30: RAM64 
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3.17 Making more and more memory 


With the RAM64 now implemented, the next step was to continue to increase memory 
size. We did so, building the RAM512 and ultimately the RAM4K. While building the 
RAM4K we found the memory resources of our own computer to be extremely strained 
while running the simulation, ultimately being forced to use a machine with 16-GBytes of 
RAM (the simulation of the RAM4K used 9-GBytes). Figures 3.31 and 3.32 illustrate the 
RAM512 and RAM4K. Because the memory requirement became so intense, we elected 
to stop constructing larger memory sizes. 
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LOAD 


IN/16 


ADDR11 

ADDR1Q 

ADDR9I 


ADDRs 0-8 directly run to each RAM_512 circuit 
CLOCK 



OUT/16 


Figure 3.32: RAM4K 


3.18 The Program Counter 

Now that we have constructed memory, our next task as we work towards building the 
HACK machine is to create a program counter. The program counter accepts as input a 
boolean bus IN/I 6 and boolean signals LOAD, INC, RESET, and CLOCK and produces a 
16-bit boolean output OUT/16. The program counter utili z es a muxl6, controIIed_zerol6, 
one 16-bit register, and a INC_16. There are four internal boolean buses and two internal 
boolean signals inside the program counter. The logic design of the program counter is 
shown in Figure 3.33. 


3.19 Now it’s time to jump 

An integral part of any CPU is having the ability to jump while running a program. Thus, 
we next built the JumpDetermination circuit. There are three bits in each 16-bit instruction 
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LOAD 


RESET 


CLOCK 


INC 


IN/16 



OUT/16 


Figure 3.33: The "Program Counter" 


that specify whether or not to jump. These three bits are named Jlbit, J2bit, and the JSbit. 
You must also know if the instruction is a C-instruction; thus, we have the isCinstruction 
bit. The NG bit (informing us if the number is negative) and ZR bit are also needed. 

The JumpDetermination requires a total of 25 AND gates, seven NOT gates, and seven OR 
gates. The logic design is illustrated in Figure 3.34. 


3.20 Finally, the CPU 

We are finally ready to build the central processing unit, or CPU. The HACK computer, a 
von Newmann machine, stores data and instructions in memory. The machine language is 
called the HACK machine language. A comprehensive explanation of the HACK machine 
language is provided in Chapter 4 of The Elements of Computing Systems [12]. If the MSB 
is set to zero (0), the bits are interpreted as an address or data. If the MSB is set to one, the 
bits are interpreted as an instruction [12]. 

Instructions allow the CPU to know what operations the user wants performed - logical, 
arithmetic, etc. There are four main fields of HACK instructions (known as C-instructions). 
Figure 3.35 shows the fields. Seven bits ( a , cl, c2, c3, c4, c5, and c6 ) instruct the ALU 
what operation to preform. The destination bits ( dl , d2, and d3) inform the CPU where to 
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Figure 3.34: JumpDetermination 


send the ALU’s output, and the jump bits (jl,j2, and j3 ) determine whether a "jump" is 
needed. 


isC 


comp 


dest jump 



binary: 


1 

X 

X 

a 

cl 

c2 

c3 

c4 

c5 

c6 

dl 

d2 

d3 

jl 

j2 

J3 


Figure 3.35: The C-lnstruction’s four fields. 


The CPU evaluates the MSB, checking to see wether the bits represent a C-instruction or 
A-instruction. The CPU fetches (i.e., reads ) a word from the instruction memory, decodes 
it, and executes the specified instruction [12]. 
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Two figures illustrate the function of the CPU and its internal components. Figure 3.36 
shows the inputs and outputs of the CPU at a high level of abstraction. Figure 3.37 shows 
the internal components of the CPU and the logical connections needed. 



outM/16 
writeM/1 
addressM/15 
pc/15 


Figure 3.36: A high-level diagram of the CPU showing both inputs and outputs. 


instruction/16 
inM 

reset 



Figure 3.37: A low-level diagram of the CPU shows the required internal logical circuits. Each 
circled c refers to control logic. 
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3.21 And now the computer 

To accommodate SystemC constraints, we diverted from Nisan and Schocken in the follow¬ 
ing respects. First, the size of the memory (both instruction memory and data memory) was 
limited to 64 addresses (aka RAM64). When simulating HACK in SystemC with memory 
sizes larger than RAM64, simulation required much more than 4-Gbytes. However, the 
RAM64 size was sufficient to demonstrate small programs. The second deviation from the 
Nand2Tetris model was the lack of a pre-built ROM (Read-Only Memory). To address this 
problem, we used a RAM64 logic circuit as instruction memory. 

The HACK computer has three main components: an instruction-memory, the CPU, and 
data-memory. The first program we tested on the HACK computer was an addition pro¬ 
gram. This simple program adds the numbers two and three together. If all logic gates have 
been assembled and wired together correctly, the output will be five. This program requires 
six instructions; they are the following: 

//®2 

0000000000000010 

//D=A 

1110110000010000 

//®3 

0000000000000011 

//D=D+A 

1110000010010000 

//®0 

0000000000000000 

//M=D 

1110001100001000 

To make this program work, we assemble the computer as depicted in Figure 3.38, then to 
load the program into instruction memory, we load each instruction one-by-one, making 
sure that the SET bit is enabled and the GET bit disabled during the loading process. Next 
we run the program for a predetermined, fixed number of clock cycles, with SET and GET 
disabled. Finally, we read the contents of the data memory at address zero. This requires 
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one additional clock cycle with the GET bit enabled and the SET bit disabled. At the 
completion of the simulation, the contents inside address zero equal five, exactly what we 
expected. 


SET GET 



RESET 


Figure 3.38: The FIACK computer consists of three main parts: instruction memory, the CPU, 
and data memory. 
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CHAPTER 4: 

Experimental Setup and Results 


Our thesis project both constructs a simple modem computer utilizing SystemC and facil¬ 
itates the design and integration of policy enforcement circuitry. Having constructed the 
HACK computer described in the previous chapter and verified that it can correctly run a 
simple program, our next step was to run a more complex program and verify that the com¬ 
puter yielded a correct output. We chose to run the multiply program provided by Nisan 
and Schocken [12]. The instructions of the multiply program are provided below: 


// This is the multiply program - it is a simple test program for multiplying 
// two numbers. We will initialize the instruction memory with this program. 
// M[0] should contain the value 30 at the end of the program. 

// @5 

0000000000000101 
// D=A 

1110110000010000 

// @R1 

0000000000000001 
// M=D 

1110001100001000 
// @6 

0000000000000110 
// D=A 

1110110000010000 
// @R2 

0000000000000010 
// M=D 

1110001100001000 
// @R0 

0000000000000000 
// M=0 
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1110101010001000 
// @i 

0000000000010000 
// M=0 

1110101010001000 
// (LOOP) 

// @R1 

0000000000000001 
// D=M 

1111110000010000 
// ffli 

0000000000010000 
// D=D-M 
1111010011010000 
// SEND 

0000000000011010 
// D;JLE 

1110001100000110 
// 0R2 

0000000000000010 
// D=M 

1111110000010000 
// ORO 

0000000000000000 
// M=M+D 

1111000010001000 
// ffli 

0000000000010000 
// M=M+1 

1111110111001000 
// OLOOP 

0000000000001100 
// 0;JMP 

1110101010000111 
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// (END) 

// SEND 

0000000000011010 
// 0;JMP 

1110101010000111 


The multiply program consists of 28 instructions; thus, the program length is set at 28. 
Unlike the simple addition program in Chapter 3, the multiply program also contained 
"looping" via conditional jumps. Because of this feature, we had to increase the max value 
to allow enough cycle-iterations to complete the program. We chose to set max to a value 
of 128. At the conclusion of the 128 clock cycles, the value of M[0] was thirty, exactly 
what was expected. 

With the outputs of both programs ( addition and multiply ) yielding correct results, our 
confidence in the creation of the HACK machine was sufficient to move to the second 
portion of our project—the design of policy enforcement circuitry. To demonstrate built-in 
policy enforcement capabilities, we designed an integrity checker. 


4.1 Creating the Integrity Checker 

The integrity checker protects the HACK machine’s data memory against unauthorized 
modifications. Our simple demonstration assumes that the first four lines of instruction 
code come from an "untrusted" source; thus, we want to prevent the instructions from 
writing to unauthorized memory locations. We chose to allow untrusted code to write to 
the first sixteen data memory addresses, but to deny the untrusted code from writing to any 
other addresses. 

We constructed the integrity checker using four OR-gates, five AND-gates, and six NOT- 
gates. The program counter’s second, third, fourth, and fifth bits are fed in to check if the 
instruction resides in the secure or insecure portion of the code. If the integrity checker 
(i.e., "checker") detects the value in the program counter to be less than four, it will deny 
writes to data addresses greater than or equal to sixteen. If any of the values of pc2, pc3, 
pc4, or pc5 is one, this indicates that the instruction comes from the trusted portion of the 
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code; thus, the program is allowed to write to any location in data memory. Figure 4.1 
illustrates the logical layout of the integrity checker described above. 



allow 


Figure 4.1: The Integrity Checker. 


4.2 Incorporating the Integrity Checker in the HACK 
computer design 

Integrating the integrity checker into the HACK computer design is straightforward, merely 
affecting the "writeM" bit, which proceeds from the CPU and is fed into mux2. The mux2 
gate accommodates the GET bit. The output of the mux2 gate is then "anded" with the 
output of the checker. If the writeM bit is enabled and the allow -bit proceeding from the 
checker is also enabled, the output of the CPU, outM , will be written into data-memory at 
the specified address. Conversely, if the integrity checker detects that an instruction in the 
untrusted portion of the code is trying to write to a prohibited address, the allow-bit will be 
disabled , and writing to data-memory will be denied. 
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SET GET 



RESET 


Figure 4.2: The FIACK computer with the "INTEGRITY CHECKER" installed. 

4.3 The Test 

To demonstrate that the integrity checker correctly prevents writes to unauthorized loca¬ 
tions, we created and ran the following test program: 

//(UNTRUSTED) 

//@R0 

0000000000000000 
//M=l 

1110111111001000 

//@16 

0000000000010000 
//M=l 

1110111111001000 

//(TRUSTED) 

//®7 
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0000000000000111 

//D=A 

1110110000010000 

//@R0 

0000000000000000 

//M=D 

1110001100001000 

//@16 

0000000000010000 

//M=D 

1110001100001000 

//(END) 

//SEND 

0000000000001010 
//0; JMP 

1110101010000111 

As previously discussed, the first four instructions are considered untrusted. The next eight 
instructions are considered trusted. To validate the functionality of the checker, we first 
ran the test code in the HACK machine without installing the integrity checker. After 
two clock-cycles, the program should write the value of one (1) into M[0] (data-memory 
address zero). After four clock-cycles, the program will write a value of one (1) into M[16]. 
After eight clock-cycles, the program will write a value of seven (7) into M[0], and after 
ten clock-cycles the value of seven will also be written into M[16]. Running the program 
yielded the expected results, as shown in Table 4.1. For this program, we set max to twenty- 
four. 


Memory Address 

Clock-cycles 

Data-Values 

M[0] 

2 

1 

M[16] 

4 

1 

M[0] 

8 

7 

M[16] 

10 

7 


Table 4.1: Memory values after running the "test" program in the HACK machine without 
installing the integrity checker. 
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We next ran the "test" program on the HACK computer with the integrity checker installed. 
Untrusted code can still write to M[0-15] of data-memory, but not to data-memory locations 
greater than or equal to sixteen. As shown in Table 4.2, M[0] has a value of one after two 
clock-cycles, M[16] remains uninitialized after four clock-cycles, and the values of M[0] 
and M[16] will be set to seven by the trusted instructions as before. 


Memory Address 

Clock-cycles 

Data-Values 

M[0] 

2 

1 

M[16] 

4 

uninitialized 

M[0] 

8 

7 

M[16] 

10 

7 


Table 4.2: Memory values after running the "test" program with the checker installed. 


4.4 Impact on system performance 

To measure the impact of the integrity checker on system performance, we run the test 
program both with and without the integrity checker installed. We measure simulation 
time using the UNIX ’time’ command. The timing results are listed in Tables 4.3 and 4.4, 
with max set to a value of twenty-four. Installing the integrity checker has minimal impact 
on system performance (Averages with checker installed - Real: 2.608, User: 2.276, and 
System: 0.285, versus averages without checker installed - Real: 2.601, User: 2.275, and 
System: 0.284). 


Tests run without security mechanism 

Test One 

Test Two 

Test Three 

Test Four 

Real 

2.616 

2.603 

2.602 

2.581 

User 

2.279 

2.278 

2.263 

2.280 

System 

0.286 

0.277 

0.289 

0.283 


Table 4.3: Timing results of running test program without inclusion of the integrity checker. 


Tests run with security mechanism 

Test One 

Test Two 

Test Three 

Test Four 

Real 

2.608 

2.609 

2.620 

2.598 

User 

2.273 

2.275 

2.290 

2.269 

System 

0.289 

0.285 

0.286 

0.282 


Table 4.4: Timing results of running test program with the inclusion of the integrity checker. 
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CHAPTER 5: 

Conclusion and Future Work 


The growth in system complexity enabled by Moore’s Law poses major challenges for de¬ 
signers to ensure correct operation, security, fault tolerance, and other key properties. Since 
electronic design automation (EDA) tools (e.g., logic synthesis) first appeared decades ago, 
the complexity of both hardware and software has risen dramatically. To tackle this prob¬ 
lem, high-level synthesis (HLS) is a design methodology being embraced by major chip 
manufacturers, including Xilinx, which incorporates HLS into its Vivado design software 
suite. Though Vivado does not use SystemC, it uses another C-like language to express 
both hardware and software. HLS is related to electronic system-level (ESL) design, which 
allows a chip designer to express an algorithm in a high-level language, and the design tools 
automatically translate the high-level specification into a cycle-accurate system. Since de¬ 
signers wishing to build a secure system must be able to fully comprehend the system and 
have mastery over the design tools that practitioners are using, HLS has the potential to 
facilitate trustworthy system development by helping designers efficiently express hard¬ 
ware and software components for computation, communication, security, reliability, etc. 
Also, since modeling languages like SystemC allow designers to express both hardware 
and software, they facilitate co-simulation of hardware and software in the same simula¬ 
tion environment, which offers the potential of greater efficiency than traditional methods. 

To explore the application of HLS to trustworthy system development, this thesis applied 
a HLS design approach to a simple 16-bit computer, implementing it in the popular Sys¬ 
temC modeling language. With only a C++ compiler and the SystemC library, we were 
able to design and simulate the CPU, instruction memory, and data memory. Compiling 
the C++ code and linking with the SystemC library generates an executable, and running 
the executable performs a simulation of the hardware. While simulating the computer’s 
hardware, we were able to run programs in the machine language of the CPU on the sim¬ 
ulated hardware. We had to reduce the size of our simulated computer’s memory due to 
the well-known problem of fine-grained simulations requiring large amounts of memory 
as system complexity increases. We integrated a simple memory integrity policy enforce¬ 
ment mechanism, also designed in SystemC, into our design. Our simulation results show 
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that the mechanism correctly enforces the integrity policy and imposes minimal impact on 
overall system performance. 

We see many opportunities for future work. For example, it would be worthwhile to explore 
more complex processor designs and more sophisticated security policy enforcement mech¬ 
anisms. It would also be useful to leam more about the capabilities of production-grade, 
tape-out tools l ik e Vivado HLS (e.g., modeling a system with Vivado and prototyping it on 
an inexpensive FPGA board, such as the Basys 2 board from Digilent). We would also like 
to overcome the technical hurdles to more fully implement the 16-bit design, including a 
display, keyboard, and larger memory. We would like to perform more optimizations on 
our 16-bit design to make it more efficient and to compare its implementation in SystemC 
against the same design expressed in traditional HDL. We would like to become more pro¬ 
ficient in SystemC so that we can more efficiently and elegantly express circuits. We would 
also like to express our design in other languages, environments, and tool flows. Finally, 
the broader impact of our work aims to make it easier for Computer Scientists to leverage 
custom hardware’s benefits. 
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